Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/ROCM/IPC: Use correct agent for IPC copies #4516

Merged
merged 1 commit into from
Dec 4, 2019

Conversation

souravzzz
Copy link
Member

@souravzzz souravzzz commented Nov 27, 2019

What

Use local GPU agent as the source and destination for ROCM-IPC copies

Why

  • Fix incorrect agent selection when all HSA agents are not visible to all ranks
  • Fix failure when unequal number of GPUs are assigned to different ranks

Fix incorrect agent selection when all HSA agents are
not visible to all ranks. Fix failure when unequal number
of GPUs are assigned to different ranks.
@swx-jenkins3
Copy link
Collaborator

Can one of the admins verify this patch?

@shamisp shamisp requested a review from paklui November 27, 2019 17:45
@shamisp
Copy link
Contributor

shamisp commented Nov 27, 2019

You probably want to add OSU copyright to the file

@souravzzz
Copy link
Member Author

Hi @shamisp I am working at AMD now.

@shamisp
Copy link
Contributor

shamisp commented Nov 27, 2019

@souravzzz - congratulations !!! :)

@yosefe
Copy link
Contributor

yosefe commented Nov 28, 2019

ok to test

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 2 of 25 workers (click for details)

Note: the logs will be deleted after 05-Dec-2019

Agent/Stage Status
_main ❓ ABORTED
hpc-arm-hwi-jenkins_W2 ❓ ABORTED
hpc-test-node-legacy_W0 ❌ FAILURE
hpc-test-node-legacy_W3 ❌ FAILURE
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W1 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W2 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W3 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W0 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W1 ❓ UNKNOWN
hpc-test-node-gpu_W0 ❓ UNKNOWN
hpc-test-node-gpu_W1 ❓ UNKNOWN
hpc-test-node-gpu_W2 ❓ UNKNOWN
hpc-test-node-gpu_W3 ❓ UNKNOWN
hpc-test-node-legacy_W1 ❓ UNKNOWN
hpc-test-node-legacy_W2 ❓ UNKNOWN
hpc-test-node-new_W0 ❓ UNKNOWN
hpc-test-node-new_W1 ❓ UNKNOWN
hpc-test-node-new_W2 ❓ UNKNOWN
hpc-test-node-new_W3 ❓ UNKNOWN

@mellanox-github
Copy link
Contributor

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 05-Dec-2019

Agent/Stage Status
_main ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@souravzzz
Copy link
Member Author

@yosefe @shamisp Is anything else required to merge this PR?

@shamisp
Copy link
Contributor

shamisp commented Dec 4, 2019

@yosefe ?

@yosefe yosefe merged commit 0894ca7 into openucx:master Dec 4, 2019
@souravzzz souravzzz deleted the topic/sourav/rocm-ipc-agent branch December 4, 2019 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants